All of the things in this document
Follow us on twitter!
Charles T. Gray plays on normal tidyverse difficulty setting.
Gaming difficulty:
What about our other presenters?
We are here to learn from you, too.
The plan is to live code in these notes, enabling us to edit these notes live using the wonderful xaringan::infinite_moon_reader function, incorporating feedback.
We will all work from a personal copy of these slides.
After some introductory waffling, and set-up, we’ll download the file together.
You will be able to annotate these notes or write extra code examples for yourself.
Indeed, there is little code provided in these notes. But there are lots of functions. We will write the code together.
So that you have the experience of getting your hands dirty with code, we’ll use pseudo code.
<this is pseudo code>
install.packages("<you write code here>")
If I wanted to install a package called metafor, I would use the code
install.packages("metafor")
Notice that we drop the <>. That’s not code, it’s pseudo code to highlight where your input is.
Don’t be afraid to harness all tools available:
<x> in <language>”.Finally, everyone needs to ragequit sometimes.
Yup, all good.
Some assistance would be greatly appreciated.
Also, I think I broke the internet.
I did it already. Trying an extension question, want to join me?
R users are data detectives.
We’re going to try to cover everything except Model.
This image is from R for Data Science, a great text to get started with. It’s available online as a free ebook.
For us to learn how to be data detectives, we’ll need some data.
To familiarise ourselves with R, we will do what R users do.
We will explore a dataset.
A fitting dataset.
A few images to explain why a dataset about witch trials might be appropriate for a workshop hosted by an advocacy group for underrepresented genders.
The inspiration for this workshop was an analysis by Steph de Silva, of useR! 2018 keynote renown.
Steph, please tell us about the paper that generated this dataset.
Steph delights all and sundry.
Thanks, Steph!
Group discussion
What would you like to know?
Write questions on whiteboard
For this workshop:
“The plane is pretty boring without the airport around it.”
(Tip of the hat to Julia Lowndes for the aeroplane analogy.)
Installation instructions adapted with appreciation from a previous workshop.
Go to the Comprehensive R Archive Network(CRAN) website.
It was first in a google search for ‘cran’ in June 2018.
Go to the RStudio website.
It was first in a google search for ‘rstudio’ in June 2018.
Choose RStudio and scroll down to the blue Download RStudio Desktop button.
Click the green button to download RStudio Desktop Open Source License and select appropriate installer for your operating system.
Double click the installer and follow the prompts to set up RStudio.
Working in an RStudio project has many benefits.
R-Ladies presenters gesticulate wildly at RStudio
Let’s start with a couple useful panes.
Help-Cheatsheets-RStudio IDE Cheatsheet
The console is where you can execute single-line R commands.
The console is located, by default, in the lower left pane.
Try 3 + 2 and press enter:
# Type 3 + 2 here. Here's hoping infinite moon reader works.
3+2
## [1] 5
I can store the number 5 in an object x.
To assign a value we use an arrow <-.
x <- 5
What happens when you type x into the Console after assigning the value 5 to it?
What do you see in the Environment pane?
If I assign the value 5 to the object x, and call it in the console, it returns the value assigned.
# Assign 5 to x and call x.
There are many complex data types in R.
Data objects can be made of
Or tables of combinations of the above.
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
And much more.
The witch trials dataset is a table.
It is a tidy data structure:
We’ll do this analysis in R markdown.
Open File-New File-R Markdown
Follow the prompts to install the required packages.
Give your document a title and press ok.
This will open an Untitled2 template.
Save your document - note, you have given your document a title, not saved it!
Press the wheel next to the knit icon at the top of the Editor pane and select Viewer.
Now we will knit our document.
A treasure trove!
Delete everything in your .Rmd file.
Look up softloud charles gray github
You want to navigate to: https://github.com/softloud
Click on rcurious in my Pinned repositories.
Click on workshop and then workshop.Rmd and then click the Raw button and copy and paste the text into your .Rmd file.
control + shift + k to knit
But it is annoying having a pop out window, so let’s view these notes in the Viewer pane.
We are going to modify the YAML at the top of the document so that you can see how to toggle a presentation into notes.
Press the little document outline button in the top right corner of the Editor pane, and navigate to the top of the document.
output :
# ioslides_presentation
html_document:
toc: true
toc_float: true
Knit again. (This takes a while.)
From now on, we’ll refer to both the notes and the slides.
Your notes are made from the same code as the slides we’re looking at.
If you want information from a few slides ago, you can find it in your notes.
Navigate in the Viewer pane using the floating table of contents to the image of Hillary Clinton in the section called A fitting dataset.
Navigate back to this section in your Editor pane by using the Document Outline provided by the button in the top right of the Editor.
Now you can take notes directly into the script file for these slides.
Let’s create a section at the top of the document called Useful tips.
We’ll create a heading directly above the section called Hi. (Use your Document Outline to navigate there.)
It looks likes this at the top of the script file.
## Hi!
#. The more ###, the smaller the heading.We can add bullet points with -. So,
- Euclid
- Beanie
renders in your output like this:
You can make tables, insert images and videos, and much more!
There are two cheatsheets on RMarkdown in Help-Cheatsheets.
Ask around for a “what I wish I’d known” story about R or investigate a cheatsheet.
Find a tip that sounds useful to you and add it to your notes in your Useful tips section as a bullet point.
Knit your document and bask in the pretty!
Try deleting the images. You can always download a fresh copy of the notes if you want to get them back.
Packages are collections of other people’s code.
Often someone has already written a script that does what you want to do.
For example, we want to import the witch trials data. We will use a package that helps with data wrangling tasks like this, the tidyverse.
We’re going to use the metapackage tidyverse to help us with our data analysis.
The most common element of packages are functions. R also comes preloaded with a base of functions commonly used.
Functions run other people’s code for us, so that we don’t have to reinvent the wheel. We will use functions to intall and load the tidyverse.
<function()>To learn more about a function, type ?function into the Console, and the Help pane will display documentation.
We want to install the package tidyverse.
Take a look at the help documentation for the function install.packages().
install.packages("<name of package>")
library(<name of package>)
A friendly reminder that <> indicates pseudo code. You can navigate back to that section in your notes in your Viewer.
We would like to install and use a package called “tidyverse”.
Let’s try.
Press the little document outline button in the top right corner of the Editor pane.
Navigate to this part of your notes and add code in the chunk that will load the tidyverse package.
# This is a code chunk.
# We can write informative comments with a hash # at the start.
# Load the tidyverse using the library() function.
# Press the green arrow in the top right corner of the chunk to run!
# Don't forget, you need to install the package before you can use it.
#install.packages("tidyverse")
librarian::shelf(tidyverse)
## Warning in librarian::shelf(tidyverse): cran_repo = '@CRAN@' is not a valid URL.
## Defaulting to cran_repo = 'https://cran.r-project.org'.
## ── Attaching packages ────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.4
## ✔ tidyr 0.7.2 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.4.4
## Warning: package 'stringr' was built under R version 3.4.4
## ── Conflicts ───────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Since the data is stored on an online repository, we can import it via url.
We can import this data using the read_csv() function from the tidyverse.
This function takes one argument, the url, which goes between the () as a “character string”.
There are many types file types, which often need special care.
Presenters discuss different file types and old battle stories of importing data.
The data is found here: “https://raw.githubusercontent.com/JakeRuss/witch-trials/master/data/trials.csv”
Try importing the data at the console with read_csv. What output do you see?
read_csv with the argument url produces a data object. An object we can assign.
Open a code chunk here and read the data in using read_csv and assign <- the data to an object called witchdat
What do you see in your Environment?
Click on it!
Let’s explore the information in this table.
Lots of objects in R <an R object> are friendly to the summary(<an R object>) function.
What’s is the output of summary for witchdat?
summaryAn alternative is the skim() function from from the skimr package.
skimr packageskimr package in your notesskim function to the witchdat dataWhat is the difference between the output of summary and skim? Which do you like better and why?
Based on this new information, what questions do we add or update?
At this point, we often wish to manipulate the data in some way.
This is variously known as wrangling, cleaning, and scrubbing.
This workshop is based on Steph de Silva’s wonderful analysis Witch Hunting in Europe: a discovery of missingness.
One of the first things that Steph does is change the name of the column gadm.adm0 to something more human-interpretable, country.
Steph notes, “This is a gross oversimplification of geography in the middle ages of Europe, but it describes the location of the trial in terms that will be most familiar to many modern users.”
Let’s use the rename() function to change the name of the variable (column) gadm.adm0 to country.
To do this, we’ll learn a very useful operator, the pipe %>%.
Piping makes code easier to read (arguably).
For example, we saw a snapshot of the preloaded iris data earlier.
The head() function takes one argument, a table:
head(<some data table>).
But we could also pipe %>% the data into the function.
<some data table> %>% head()
Use the pipe function to present the top of the witchdat dataset.
We can rename a column by constructing a pipe from the table to the rename function
<my data> %>%
rename(<newname> = <oldname>)
Pipe witchdat to the rename function and change gadm.adm0 to country.
The tidy way of plotting data is with the ggplot2 package, which comes with the tidyverse.
<some data> %>% # pipe the data to ggplot()
What happens when you %>% the witchdat table into ggplot()?
We define x and y axes of the plot with aesthetics in ggplot.
<some data> %>%
ggplot(aes(x = <column name>, y = <column name>))
What happens if you set x and y axes to column names in aes?
Let’s see how many women were murdered as witches over time.
We’ll add a plot layer + to our ggplot using geom_point for a scatterplot.
Set the x axis to year and the y axis to deaths.
<data> %>%
ggplot(aes(x = <column with year>, y = <column with deaths>)) +
geom_point() # Adds a scatterplot.
Using the data visualisation cheatsheet (Help-Cheatsheets) to figure out how to add a title layer to your plot.
In groups or alone, choose a question to answer.
Use cheatsheets, talk to eachother.
leaflet package to plot where the witches died sized by number of deaths. Make the dots a nice blood red.I thought you might be interested in a conversation I had while preparing this workshop. The rstats community is your best resource.
How to teach R to students so they make enough autonomous cognitive links to take more away from the experience than simply successfully running the example? @samclifford @MilesMcBain @thejholloway @visnut https://t.co/PlNt3M0xY3
— Charles T. Gray (@cantabile) June 17, 2018
Sidenote, I’d recently seen Maelle Salmon speak about the benefits of blogging to engage with the community.
This was the first time I tweeted about a blogpost; it is very reassuring to have people to bounce ideas off.
We hope you had fun.